Singapore Dedicated Server Bandwidth (Uplink) | Network latency | Environment monitoring
Xssist™ Group Pte Ltd Singapore Dedicated Servers Client Testimonials Blog Community Frequently Asked Questions Contact Page
Services
Singapore Dedicated Servers
Control Panel System
Control Panel System
Xssist Blog

Weak link

A chain is only as strong as its weakest link. On 2 separate occasions in the past 2 months, the sysadmin proves to be the weakest link. Similar problem in each case, faulty RAM in the server. We had to replace the RAM. Unfortunately, tracing 1 faulty RAM module out of several.. eg. 16 modules in a 32GB server with 2GB modules can be more difficult than it seems. The OS might indicate which CPU and which slot has the ECC error. The server might has a fault panel that points out the DIMM slot. However, sometimes we just have to swap modules in and out.. and each time having to plug out the power cord, KVM cables, network cable, SCSI cables, etc, pull out the server on its rails, open the casing, swap the RAM, put everything back and power on the server. If it works, good, if not, repeat all of the above.

On one occasion, during booting, error messages start coming up from the SCSI card.. oops. SCSI cable not connected.. sysadmin starts panicing.. what if the configuration gets corrupted? quick, pull out power cable.... hmmm... screen continues scrolling.. OOPs. pulled out power cable of another server.

On a separate occasion, need to replace RAM in a server, and install a APC fan, ACF002, for the rack to improve air flow. Pushed the fan module too far back into the rack.. managed to dislodge an external SCSI cable, of a server, of which the screws were not tighted (lesson learnt here: tighten all screws for cables, especially for critical ones like the SCSI cable). We did not realise the SCSI cable was dislodged. First sign of trouble was the fault light turned on for 1 drive out of 6 for a SCSI array, configured to RAID 10. Okay, that's not too bad.. just a drive failure right? okay, let's just replace the faulty SCSI hard disk.. pull out faulty disk, insert new disk. fault light changed status, and the disk starts syncing, or so I thought. Then.. the fault lights for 4 out of the 6 drives turned on. Ouch. 4 drives out of 6 is good enough for data loss even for RAID 10. Restarted the server, and accessed the RAID card menu.. pulled out 3 of the drives marked faulty, and replaced them, tried marking the remaining faulty as good, i.e. 1 drive out of each RAID 1, of the RAID 10 will be marked as good, RAID array will be ok. The RAID card could not detect some of the drives. getting worse and worse.. Checked the SCSI cable.. aha.. its dislodged.. pushed the cable in, tightened the screws.. marked 3 of the drives out of 6 as good.. worked! resynced the other 3 drives as well.

Xssist
Nov 08

[Sysadmin] Access to servers via mobile device and ssh
[Sysadmin] RAID 0 scaling on SCSI U320, Bonnie++ 1.93c benchmark results
[Sysadmin] TODO (Apr 2007)
[Sysadmin] Recover from mistakes in /etc/fstab or e2label usage
[Sysadmin] Server overloaded?
[Sysadmin] Server load high: CPU bound
[Sysadmin] Smokeping: deluxe latency measurement tool
[Sysadmin] Smokeping
[Sysadmin] Jul 08 to Oct 08 updates
[Sysadmin] Weak link - downtimes caused by the organic being
[Sysadmin] BIOS upgrades - uniflash - hotflash
[Sysadmin] Sizing for Virtual Private Server (VPS) & SSDs
[Sysadmin] iphone, ipod - bluetooth keyboard - Nokia e51
[Sysadmin] e2label, fdisk, /etc/fstab, mount, linux rescue, rescue disk, CentOS
[Sysadmin] opensuse, fix waiting for mandatory device, eth0, eth1, eth2, eth3
[Sysadmin] mount: could not find filesystem '/dev/root'
[Sysadmin] Parallels Virtuozzo Physical server to Container migration (vzp2v)
[Web hosting] DDOS (Distributed Denial of Service)
[Web hosting] Uptime for dedicated server, VPS and shared server
[Web hosting] Shared, Guaranteed and Dedicated Bandwidth
[Web hosting] Unmetered bandwidth
[Web hosting] Free domains?
[Web hosting] Joomla Scalability
[SPAM handling] Tracking applications which are exploited for mass spam mailing
[Buzzwords] Clusters, Clustering
[Security] Destruction of faulty hard disks
[Storage] Benchmark using iometer on linux
[SSD] Benchmark Intel X25-E and Intel X25-M flash SSDs
[SSD] Intel X25-E 64GB G1, 4KB Random IOPS, iometer benchmark
[SSD] Intel X25-M 160GB G2, 4KB Random IOPS, iometer benchmark
[SSD] Comparison of Intel X25-E G1 vs Intel X25-M G2
[cPanel] ClamAV version has reached End of Life! Please upgrade to version 0.95
[cPanel] How to install Java, ImageMagick and ffmpeg
[Perl] Opening text files for reading, and simple regexp (regular expressions)